Overview
Brought to you by YData
Dataset statistics
| Number of variables | 25 |
|---|---|
| Number of observations | 10227 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 1 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 2.0 MiB |
| Average record size in memory | 208.0 B |
Variable types
| Categorical | 14 |
|---|---|
| Text | 11 |
| Dataset has 1 (< 0.1%) duplicate rows | Duplicates |
What programming language would you recommend an aspiring data scientist to learn first? is highly imbalanced (63.7%) | Imbalance |
Have you ever used a TPU (tensor processing unit)? is highly imbalanced (56.7%) | Imbalance |
Reproduction
| Analysis started | 2024-11-04 16:43:35.148664 |
|---|---|
| Analysis finished | 2024-11-04 16:43:44.401435 |
| Duration | 9.25 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
What is your age (# years)?
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| 25-29 | |
|---|---|
| 30-34 | |
| 35-39 | |
| 22-24 | |
| 40-44 | |
| Other values (6) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.9906131 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 22-24 |
|---|---|
| 2nd row | 40-44 |
| 3rd row | 22-24 |
| 4th row | 50-54 |
| 5th row | 22-24 |
Common Values
| Value | Count | Frequency (%) |
| 25-29 | 2523 | |
| 30-34 | 2064 | |
| 35-39 | 1420 | |
| 22-24 | 1306 | |
| 40-44 | 969 | 9.5% |
| 45-49 | 642 | 6.3% |
| 50-54 | 464 | 4.5% |
| 18-21 | 317 | 3.1% |
| 55-59 | 264 | 2.6% |
| 60-69 | 210 | 2.1% |
Length
| Value | Count | Frequency (%) |
| 25-29 | 2523 | |
| 30-34 | 2064 | |
| 35-39 | 1420 | |
| 22-24 | 1306 | |
| 40-44 | 969 | 9.5% |
| 45-49 | 642 | 6.3% |
| 50-54 | 464 | 4.5% |
| 18-21 | 317 | 3.1% |
| 55-59 | 264 | 2.6% |
| 60-69 | 210 | 2.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 10179 | |
| 2 | 9281 | |
| 4 | 8025 | |
| 3 | 6968 | |
| 5 | 6305 | |
| 9 | 5059 | |
| 0 | 3755 | 7.4% |
| 1 | 634 | 1.2% |
| 6 | 420 | 0.8% |
| 8 | 317 | 0.6% |
| Other values (2) | 96 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 51039 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 10179 | |
| 2 | 9281 | |
| 4 | 8025 | |
| 3 | 6968 | |
| 5 | 6305 | |
| 9 | 5059 | |
| 0 | 3755 | 7.4% |
| 1 | 634 | 1.2% |
| 6 | 420 | 0.8% |
| 8 | 317 | 0.6% |
| Other values (2) | 96 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 51039 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 10179 | |
| 2 | 9281 | |
| 4 | 8025 | |
| 3 | 6968 | |
| 5 | 6305 | |
| 9 | 5059 | |
| 0 | 3755 | 7.4% |
| 1 | 634 | 1.2% |
| 6 | 420 | 0.8% |
| 8 | 317 | 0.6% |
| Other values (2) | 96 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 51039 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 10179 | |
| 2 | 9281 | |
| 4 | 8025 | |
| 3 | 6968 | |
| 5 | 6305 | |
| 9 | 5059 | |
| 0 | 3755 | 7.4% |
| 1 | 634 | 1.2% |
| 6 | 420 | 0.8% |
| 8 | 317 | 0.6% |
| Other values (2) | 96 | 0.2% |
What is your gender?
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| Male | |
|---|---|
| Female |
Length
| Max length | 6 |
|---|---|
| Median length | 4 |
| Mean length | 4.2790652 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Male |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 8800 | |
| Female | 1427 | 14.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 8800 | |
| female | 1427 | 14.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 11654 | |
| a | 10227 | |
| l | 10227 | |
| M | 8800 | |
| F | 1427 | 3.3% |
| m | 1427 | 3.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 43762 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 11654 | |
| a | 10227 | |
| l | 10227 | |
| M | 8800 | |
| F | 1427 | 3.3% |
| m | 1427 | 3.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 43762 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 11654 | |
| a | 10227 | |
| l | 10227 | |
| M | 8800 | |
| F | 1427 | 3.3% |
| m | 1427 | 3.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 43762 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 11654 | |
| a | 10227 | |
| l | 10227 | |
| M | 8800 | |
| F | 1427 | 3.3% |
| m | 1427 | 3.3% |
| Distinct | 59 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 52 |
|---|---|
| Median length | 28 |
| Mean length | 10.895571 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | France |
|---|---|
| 2nd row | Australia |
| 3rd row | India |
| 4th row | France |
| 5th row | India |
| Value | Count | Frequency (%) |
| of | 2202 | 12.0% |
| united | 2122 | 11.6% |
| india | 1879 | 10.2% |
| states | 1838 | 10.0% |
| america | 1838 | 10.0% |
| other | 529 | 2.9% |
| brazil | 456 | 2.5% |
| japan | 402 | 2.2% |
| russia | 362 | 2.0% |
| ireland | 320 | 1.7% |
| Other values (63) | 6392 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 13320 | 12.0% |
| i | 10131 | 9.1% |
| e | 9931 | 8.9% |
| n | 8714 | 7.8% |
| t | 8233 | 7.4% |
| 8113 | 7.3% | |
| r | 6513 | 5.8% |
| d | 5733 | 5.1% |
| o | 4093 | 3.7% |
| s | 3342 | 3.0% |
| Other values (39) | 33306 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 111429 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 13320 | 12.0% |
| i | 10131 | 9.1% |
| e | 9931 | 8.9% |
| n | 8714 | 7.8% |
| t | 8233 | 7.4% |
| 8113 | 7.3% | |
| r | 6513 | 5.8% |
| d | 5733 | 5.1% |
| o | 4093 | 3.7% |
| s | 3342 | 3.0% |
| Other values (39) | 33306 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 111429 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 13320 | 12.0% |
| i | 10131 | 9.1% |
| e | 9931 | 8.9% |
| n | 8714 | 7.8% |
| t | 8233 | 7.4% |
| 8113 | 7.3% | |
| r | 6513 | 5.8% |
| d | 5733 | 5.1% |
| o | 4093 | 3.7% |
| s | 3342 | 3.0% |
| Other values (39) | 33306 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 111429 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 13320 | 12.0% |
| i | 10131 | 9.1% |
| e | 9931 | 8.9% |
| n | 8714 | 7.8% |
| t | 8233 | 7.4% |
| 8113 | 7.3% | |
| r | 6513 | 5.8% |
| d | 5733 | 5.1% |
| o | 4093 | 3.7% |
| s | 3342 | 3.0% |
| Other values (39) | 33306 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| Master’s degree | |
|---|---|
| Bachelor’s degree | |
| Doctoral degree | |
| Professional degree | 353 |
| Some college/university study without earning a bachelor’s degree | 308 |
| Other values (2) | 206 |
Length
| Max length | 65 |
|---|---|
| Median length | 15 |
| Mean length | 17.436296 |
| Min length | 15 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Master’s degree |
|---|---|
| 2nd row | Master’s degree |
| 3rd row | Bachelor’s degree |
| 4th row | Master’s degree |
| 5th row | Master’s degree |
Common Values
| Value | Count | Frequency (%) |
| Master’s degree | 4882 | |
| Bachelor’s degree | 2680 | |
| Doctoral degree | 1798 | 17.6% |
| Professional degree | 353 | 3.5% |
| Some college/university study without earning a bachelor’s degree | 308 | 3.0% |
| I prefer not to answer | 113 | 1.1% |
| No formal education past high school | 93 | 0.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| degree | 10021 | |
| master’s | 4882 | |
| bachelor’s | 2988 | 13.0% |
| doctoral | 1798 | 7.8% |
| professional | 353 | 1.5% |
| some | 308 | 1.3% |
| college/university | 308 | 1.3% |
| study | 308 | 1.3% |
| without | 308 | 1.3% |
| earning | 308 | 1.3% |
| Other values (12) | 1431 | 6.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 40258 | |
| r | 21090 | |
| s | 14373 | 8.1% |
| 12786 | 7.2% | |
| a | 11029 | 6.2% |
| g | 10730 | 6.0% |
| d | 10422 | 5.8% |
| o | 8905 | 5.0% |
| t | 8324 | 4.7% |
| ’ | 7870 | 4.4% |
| Other values (21) | 32534 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 178321 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 40258 | |
| r | 21090 | |
| s | 14373 | 8.1% |
| 12786 | 7.2% | |
| a | 11029 | 6.2% |
| g | 10730 | 6.0% |
| d | 10422 | 5.8% |
| o | 8905 | 5.0% |
| t | 8324 | 4.7% |
| ’ | 7870 | 4.4% |
| Other values (21) | 32534 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 178321 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 40258 | |
| r | 21090 | |
| s | 14373 | 8.1% |
| 12786 | 7.2% | |
| a | 11029 | 6.2% |
| g | 10730 | 6.0% |
| d | 10422 | 5.8% |
| o | 8905 | 5.0% |
| t | 8324 | 4.7% |
| ’ | 7870 | 4.4% |
| Other values (21) | 32534 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 178321 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 40258 | |
| r | 21090 | |
| s | 14373 | 8.1% |
| 12786 | 7.2% | |
| a | 11029 | 6.2% |
| g | 10730 | 6.0% |
| d | 10422 | 5.8% |
| o | 8905 | 5.0% |
| t | 8324 | 4.7% |
| ’ | 7870 | 4.4% |
| Other values (21) | 32534 |
| Distinct | 10 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| Data Scientist | |
|---|---|
| Software Engineer | |
| Data Analyst | |
| Other | |
| Research Scientist | |
| Other values (5) |
Length
| Max length | 23 |
|---|---|
| Median length | 18 |
| Mean length | 14.307324 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Software Engineer |
|---|---|
| 2nd row | Other |
| 3rd row | Other |
| 4th row | Data Scientist |
| 5th row | Data Scientist |
Common Values
| Value | Count | Frequency (%) |
| Data Scientist | 3243 | |
| Software Engineer | 1842 | |
| Data Analyst | 1153 | 11.3% |
| Other | 1118 | 10.9% |
| Research Scientist | 1072 | 10.5% |
| Product/Project Manager | 530 | 5.2% |
| Business Analyst | 509 | 5.0% |
| Data Engineer | 448 | 4.4% |
| Statistician | 203 | 2.0% |
| DBA/Database Engineer | 109 | 1.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| data | 4844 | |
| scientist | 4315 | |
| engineer | 2399 | |
| software | 1842 | 9.6% |
| analyst | 1662 | 8.7% |
| other | 1118 | 5.8% |
| research | 1072 | 5.6% |
| product/project | 530 | 2.8% |
| manager | 530 | 2.8% |
| business | 509 | 2.7% |
| Other values (2) | 312 | 1.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 19874 | |
| a | 16057 | |
| e | 15895 | |
| i | 12147 | 8.3% |
| n | 12017 | 8.2% |
| 8906 | 6.1% | |
| s | 8888 | 6.1% |
| r | 8021 | 5.5% |
| c | 6650 | 4.5% |
| S | 6360 | 4.3% |
| Other values (20) | 31506 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 146321 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 19874 | |
| a | 16057 | |
| e | 15895 | |
| i | 12147 | 8.3% |
| n | 12017 | 8.2% |
| 8906 | 6.1% | |
| s | 8888 | 6.1% |
| r | 8021 | 5.5% |
| c | 6650 | 4.5% |
| S | 6360 | 4.3% |
| Other values (20) | 31506 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 146321 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 19874 | |
| a | 16057 | |
| e | 15895 | |
| i | 12147 | 8.3% |
| n | 12017 | 8.2% |
| 8906 | 6.1% | |
| s | 8888 | 6.1% |
| r | 8021 | 5.5% |
| c | 6650 | 4.5% |
| S | 6360 | 4.3% |
| Other values (20) | 31506 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 146321 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 19874 | |
| a | 16057 | |
| e | 15895 | |
| i | 12147 | 8.3% |
| n | 12017 | 8.2% |
| 8906 | 6.1% | |
| s | 8888 | 6.1% |
| r | 8021 | 5.5% |
| c | 6650 | 4.5% |
| S | 6360 | 4.3% |
| Other values (20) | 31506 |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| 0-49 employees | |
|---|---|
| > 10,000 employees | |
| 1000-9,999 employees | |
| 50-249 employees | |
| 250-999 employees |
Length
| Max length | 20 |
|---|---|
| Median length | 18 |
| Mean length | 16.816466 |
| Min length | 14 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1000-9,999 employees |
|---|---|
| 2nd row | > 10,000 employees |
| 3rd row | 0-49 employees |
| 4th row | 0-49 employees |
| 5th row | 50-249 employees |
Common Values
| Value | Count | Frequency (%) |
| 0-49 employees | 2849 | |
| > 10,000 employees | 2327 | |
| 1000-9,999 employees | 2010 | |
| 50-249 employees | 1687 | |
| 250-999 employees | 1354 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| employees | 10227 | |
| 0-49 | 2849 | 12.5% |
| 2327 | 10.2% | |
| 10,000 | 2327 | 10.2% |
| 1000-9,999 | 2010 | 8.8% |
| 50-249 | 1687 | 7.4% |
| 250-999 | 1354 | 5.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 30681 | |
| 0 | 21228 | |
| 9 | 16638 | |
| 12554 | ||
| o | 10227 | 5.9% |
| s | 10227 | 5.9% |
| y | 10227 | 5.9% |
| l | 10227 | 5.9% |
| p | 10227 | 5.9% |
| m | 10227 | 5.9% |
| Other values (7) | 29519 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 171982 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 30681 | |
| 0 | 21228 | |
| 9 | 16638 | |
| 12554 | ||
| o | 10227 | 5.9% |
| s | 10227 | 5.9% |
| y | 10227 | 5.9% |
| l | 10227 | 5.9% |
| p | 10227 | 5.9% |
| m | 10227 | 5.9% |
| Other values (7) | 29519 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 171982 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 30681 | |
| 0 | 21228 | |
| 9 | 16638 | |
| 12554 | ||
| o | 10227 | 5.9% |
| s | 10227 | 5.9% |
| y | 10227 | 5.9% |
| l | 10227 | 5.9% |
| p | 10227 | 5.9% |
| m | 10227 | 5.9% |
| Other values (7) | 29519 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 171982 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 30681 | |
| 0 | 21228 | |
| 9 | 16638 | |
| 12554 | ||
| o | 10227 | 5.9% |
| s | 10227 | 5.9% |
| y | 10227 | 5.9% |
| l | 10227 | 5.9% |
| p | 10227 | 5.9% |
| m | 10227 | 5.9% |
| Other values (7) | 29519 |
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| 20+ | |
|---|---|
| 1-2 | |
| 3-4 | |
| 5-9 | |
| 0 | |
| Other values (2) |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 2.9663635 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 20+ |
| 3rd row | 0 |
| 4th row | 3-4 |
| 5th row | 20+ |
Common Values
| Value | Count | Frequency (%) |
| 20+ | 2416 | |
| 1-2 | 2306 | |
| 3-4 | 1792 | |
| 5-9 | 1421 | |
| 0 | 1232 | |
| 10-14 | 738 | 7.2% |
| 15-19 | 322 | 3.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 20 | 2416 | |
| 1-2 | 2306 | |
| 3-4 | 1792 | |
| 5-9 | 1421 | |
| 0 | 1232 | |
| 10-14 | 738 | 7.2% |
| 15-19 | 322 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 6579 | |
| 2 | 4722 | |
| 1 | 4426 | |
| 0 | 4386 | |
| 4 | 2530 | 8.3% |
| + | 2416 | 8.0% |
| 3 | 1792 | 5.9% |
| 5 | 1743 | 5.7% |
| 9 | 1743 | 5.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 30337 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 6579 | |
| 2 | 4722 | |
| 1 | 4426 | |
| 0 | 4386 | |
| 4 | 2530 | 8.3% |
| + | 2416 | 8.0% |
| 3 | 1792 | 5.9% |
| 5 | 1743 | 5.7% |
| 9 | 1743 | 5.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 30337 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 6579 | |
| 2 | 4722 | |
| 1 | 4426 | |
| 0 | 4386 | |
| 4 | 2530 | 8.3% |
| + | 2416 | 8.0% |
| 3 | 1792 | 5.9% |
| 5 | 1743 | 5.7% |
| 9 | 1743 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 30337 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 6579 | |
| 2 | 4722 | |
| 1 | 4426 | |
| 0 | 4386 | |
| 4 | 2530 | 8.3% |
| + | 2416 | 8.0% |
| 3 | 1792 | 5.9% |
| 5 | 1743 | 5.7% |
| 9 | 1743 | 5.7% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| We recently started using ML methods (i.e., models in production for less than 2 years) | |
|---|---|
| We are exploring ML methods (and may one day put a model into production) | |
| We have well established ML methods (i.e., models in production for more than 2 years) | |
| No (we do not use ML methods) | |
| We use ML methods for generating insights (but do not put working models into production) |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 68.859294 |
| Min length | 13 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | I do not know |
|---|---|
| 2nd row | I do not know |
| 3rd row | No (we do not use ML methods) |
| 4th row | We have well established ML methods (i.e., models in production for more than 2 years) |
| 5th row | We are exploring ML methods (and may one day put a model into production) |
Common Values
| Value | Count | Frequency (%) |
| We recently started using ML methods (i.e., models in production for less than 2 years) | 2236 | |
| We are exploring ML methods (and may one day put a model into production) | 2195 | |
| We have well established ML methods (i.e., models in production for more than 2 years) | 2077 | |
| No (we do not use ML methods) | 1737 | |
| We use ML methods for generating insights (but do not put working models into production) | 1246 | |
| I do not know | 736 | 7.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| we | 9491 | 7.3% |
| ml | 9491 | 7.3% |
| methods | 9491 | 7.3% |
| production | 7754 | 6.0% |
| models | 5559 | 4.3% |
| for | 5559 | 4.3% |
| years | 4313 | 3.3% |
| i.e | 4313 | 3.3% |
| in | 4313 | 3.3% |
| than | 4313 | 3.3% |
| Other values (29) | 64621 |
Most occurring characters
| Value | Count | Frequency (%) |
| 118991 | ||
| e | 66751 | 9.5% |
| o | 59377 | 8.4% |
| t | 44682 | 6.3% |
| n | 40317 | 5.7% |
| s | 37936 | 5.4% |
| d | 37421 | 5.3% |
| i | 31313 | 4.4% |
| r | 31057 | 4.4% |
| a | 27237 | 3.9% |
| Other values (24) | 209142 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 704224 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 118991 | ||
| e | 66751 | 9.5% |
| o | 59377 | 8.4% |
| t | 44682 | 6.3% |
| n | 40317 | 5.7% |
| s | 37936 | 5.4% |
| d | 37421 | 5.3% |
| i | 31313 | 4.4% |
| r | 31057 | 4.4% |
| a | 27237 | 3.9% |
| Other values (24) | 209142 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 704224 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 118991 | ||
| e | 66751 | 9.5% |
| o | 59377 | 8.4% |
| t | 44682 | 6.3% |
| n | 40317 | 5.7% |
| s | 37936 | 5.4% |
| d | 37421 | 5.3% |
| i | 31313 | 4.4% |
| r | 31057 | 4.4% |
| a | 27237 | 3.9% |
| Other values (24) | 209142 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 704224 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 118991 | ||
| e | 66751 | 9.5% |
| o | 59377 | 8.4% |
| t | 44682 | 6.3% |
| n | 40317 | 5.7% |
| s | 37936 | 5.4% |
| d | 37421 | 5.3% |
| i | 31313 | 4.4% |
| r | 31057 | 4.4% |
| a | 27237 | 3.9% |
| Other values (24) | 209142 |
| Distinct | 25 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| $0-999 | |
|---|---|
| 10,000-14,999 | |
| 100,000-124,999 | 649 |
| 30,000-39,999 | 633 |
| 40,000-49,999 | 622 |
| Other values (20) |
Length
| Max length | 15 |
|---|---|
| Median length | 13 |
| Mean length | 12.215019 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 30,000-39,999 |
|---|---|
| 2nd row | 250,000-299,999 |
| 3rd row | 4,000-4,999 |
| 4th row | 60,000-69,999 |
| 5th row | 10,000-14,999 |
Common Values
| Value | Count | Frequency (%) |
| $0-999 | 1064 | 10.4% |
| 10,000-14,999 | 685 | 6.7% |
| 100,000-124,999 | 649 | 6.3% |
| 30,000-39,999 | 633 | 6.2% |
| 40,000-49,999 | 622 | 6.1% |
| 50,000-59,999 | 604 | 5.9% |
| 60,000-69,999 | 501 | 4.9% |
| 70,000-79,999 | 458 | 4.5% |
| 15,000-19,999 | 452 | 4.4% |
| 20,000-24,999 | 438 | 4.3% |
| Other values (15) | 4121 |
Length
| Value | Count | Frequency (%) |
| 0-999 | 1064 | 10.4% |
| 10,000-14,999 | 685 | 6.7% |
| 100,000-124,999 | 649 | 6.3% |
| 30,000-39,999 | 633 | 6.2% |
| 40,000-49,999 | 622 | 6.1% |
| 50,000-59,999 | 604 | 5.9% |
| 60,000-69,999 | 501 | 4.9% |
| 70,000-79,999 | 458 | 4.5% |
| 15,000-19,999 | 452 | 4.4% |
| 20,000-24,999 | 438 | 4.3% |
| Other values (16) | 4169 |
Most occurring characters
| Value | Count | Frequency (%) |
| 9 | 36688 | |
| 0 | 35359 | |
| , | 18278 | |
| - | 10179 | 8.1% |
| 1 | 6036 | 4.8% |
| 4 | 4480 | 3.6% |
| 5 | 3785 | 3.0% |
| 2 | 3758 | 3.0% |
| 3 | 1796 | 1.4% |
| 7 | 1656 | 1.3% |
| Other values (5) | 2908 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 124923 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 9 | 36688 | |
| 0 | 35359 | |
| , | 18278 | |
| - | 10179 | 8.1% |
| 1 | 6036 | 4.8% |
| 4 | 4480 | 3.6% |
| 5 | 3785 | 3.0% |
| 2 | 3758 | 3.0% |
| 3 | 1796 | 1.4% |
| 7 | 1656 | 1.3% |
| Other values (5) | 2908 | 2.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 124923 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 9 | 36688 | |
| 0 | 35359 | |
| , | 18278 | |
| - | 10179 | 8.1% |
| 1 | 6036 | 4.8% |
| 4 | 4480 | 3.6% |
| 5 | 3785 | 3.0% |
| 2 | 3758 | 3.0% |
| 3 | 1796 | 1.4% |
| 7 | 1656 | 1.3% |
| Other values (5) | 2908 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 124923 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 9 | 36688 | |
| 0 | 35359 | |
| , | 18278 | |
| - | 10179 | 8.1% |
| 1 | 6036 | 4.8% |
| 4 | 4480 | 3.6% |
| 5 | 3785 | 3.0% |
| 2 | 3758 | 3.0% |
| 3 | 1796 | 1.4% |
| 7 | 1656 | 1.3% |
| Other values (5) | 2908 | 2.3% |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| $0 (USD) | |
|---|---|
| $100-$999 | |
| $1000-$9,999 | |
| $1-$99 | |
| $10,000-$99,999 |
Length
| Max length | 17 |
|---|---|
| Median length | 15 |
| Mean length | 10.221375 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | $0 (USD) |
|---|---|
| 2nd row | $10,000-$99,999 |
| 3rd row | $0 (USD) |
| 4th row | $10,000-$99,999 |
| 5th row | $100-$999 |
Common Values
| Value | Count | Frequency (%) |
| $0 (USD) | 3161 | |
| $100-$999 | 1995 | |
| $1000-$9,999 | 1859 | |
| $1-$99 | 1215 | 11.9% |
| $10,000-$99,999 | 1128 | 11.0% |
| > $100,000 ($USD) | 869 | 8.5% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| usd | 4030 | |
| 0 | 3161 | |
| 100-$999 | 1995 | |
| 1000-$9,999 | 1859 | |
| 1-$99 | 1215 | 8.0% |
| 10,000-$99,999 | 1128 | 7.5% |
| 869 | 5.7% | |
| 100,000 | 869 | 5.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 21585 | |
| 9 | 21491 | |
| $ | 17293 | |
| 1 | 7066 | 6.8% |
| - | 6197 | 5.9% |
| , | 4984 | 4.8% |
| 4899 | 4.7% | |
| ( | 4030 | 3.9% |
| U | 4030 | 3.9% |
| S | 4030 | 3.9% |
| Other values (3) | 8929 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 104534 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 21585 | |
| 9 | 21491 | |
| $ | 17293 | |
| 1 | 7066 | 6.8% |
| - | 6197 | 5.9% |
| , | 4984 | 4.8% |
| 4899 | 4.7% | |
| ( | 4030 | 3.9% |
| U | 4030 | 3.9% |
| S | 4030 | 3.9% |
| Other values (3) | 8929 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 104534 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 21585 | |
| 9 | 21491 | |
| $ | 17293 | |
| 1 | 7066 | 6.8% |
| - | 6197 | 5.9% |
| , | 4984 | 4.8% |
| 4899 | 4.7% | |
| ( | 4030 | 3.9% |
| U | 4030 | 3.9% |
| S | 4030 | 3.9% |
| Other values (3) | 8929 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 104534 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 21585 | |
| 9 | 21491 | |
| $ | 17293 | |
| 1 | 7066 | 6.8% |
| - | 6197 | 5.9% |
| , | 4984 | 4.8% |
| 4899 | 4.7% | |
| ( | 4030 | 3.9% |
| U | 4030 | 3.9% |
| S | 4030 | 3.9% |
| Other values (3) | 8929 |
| Distinct | 3469 |
|---|---|
| Distinct (%) | 33.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 75.376161 |
| Min length | 25 |
Unique
| Unique | 3030 ? |
|---|---|
| Unique (%) | 29.6% |
Sample
| 1st row | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 |
|---|---|
| 2nd row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 |
| 3rd row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 |
| 4th row | Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1 |
| 5th row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1 |
| Value | Count | Frequency (%) |
| 1 | 42435 | |
| etc | 9496 | 8.2% |
| local | 5586 | 4.8% |
| development | 5586 | 4.8% |
| environments | 5586 | 4.8% |
| rstudio | 5586 | 4.8% |
| jupyterlab | 5586 | 4.8% |
| software | 3910 | 3.4% |
| statistical | 2288 | 2.0% |
| basic | 1653 | 1.4% |
| Other values (2157) | 27942 |
Most occurring characters
| Value | Count | Frequency (%) |
| 105427 | 13.7% | |
| , | 71749 | 9.3% |
| e | 62224 | 8.1% |
| t | 48953 | 6.4% |
| 1 | 45973 | 6.0% |
| - | 42584 | 5.5% |
| o | 35168 | 4.6% |
| a | 26812 | 3.5% |
| n | 25019 | 3.2% |
| c | 24324 | 3.2% |
| Other values (45) | 282639 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 770872 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 105427 | 13.7% | |
| , | 71749 | 9.3% |
| e | 62224 | 8.1% |
| t | 48953 | 6.4% |
| 1 | 45973 | 6.0% |
| - | 42584 | 5.5% |
| o | 35168 | 4.6% |
| a | 26812 | 3.5% |
| n | 25019 | 3.2% |
| c | 24324 | 3.2% |
| Other values (45) | 282639 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 770872 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 105427 | 13.7% | |
| , | 71749 | 9.3% |
| e | 62224 | 8.1% |
| t | 48953 | 6.4% |
| 1 | 45973 | 6.0% |
| - | 42584 | 5.5% |
| o | 35168 | 4.6% |
| a | 26812 | 3.5% |
| n | 25019 | 3.2% |
| c | 24324 | 3.2% |
| Other values (45) | 282639 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 770872 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 105427 | 13.7% | |
| , | 71749 | 9.3% |
| e | 62224 | 8.1% |
| t | 48953 | 6.4% |
| 1 | 45973 | 6.0% |
| - | 42584 | 5.5% |
| o | 35168 | 4.6% |
| a | 26812 | 3.5% |
| n | 25019 | 3.2% |
| c | 24324 | 3.2% |
| Other values (45) | 282639 |
| Distinct | 6 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| 3-5 years | |
|---|---|
| 1-2 years | |
| < 1 years | |
| 5-10 years | |
| 10-20 years |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 9.3493693 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | 1-2 years |
| 3rd row | < 1 years |
| 4th row | 20+ years |
| 5th row | 3-5 years |
Common Values
| Value | Count | Frequency (%) |
| 3-5 years | 2672 | |
| 1-2 years | 2542 | |
| < 1 years | 1892 | |
| 5-10 years | 1663 | |
| 10-20 years | 955 | 9.3% |
| 20+ years | 503 | 4.9% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 10227 | |
| 3-5 | 2672 | 12.0% |
| 1-2 | 2542 | 11.4% |
| 1892 | 8.5% | |
| 1 | 1892 | 8.5% |
| 5-10 | 1663 | 7.4% |
| 10-20 | 955 | 4.3% |
| 20 | 503 | 2.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 12119 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7832 | |
| 1 | 7052 | |
| 5 | 4335 | 4.5% |
| 0 | 4076 | 4.3% |
| Other values (4) | 9067 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 95616 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 12119 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7832 | |
| 1 | 7052 | |
| 5 | 4335 | 4.5% |
| 0 | 4076 | 4.3% |
| Other values (4) | 9067 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 95616 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 12119 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7832 | |
| 1 | 7052 | |
| 5 | 4335 | 4.5% |
| 0 | 4076 | 4.3% |
| Other values (4) | 9067 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 95616 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 12119 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7832 | |
| 1 | 7052 | |
| 5 | 4335 | 4.5% |
| 0 | 4076 | 4.3% |
| Other values (4) | 9067 |
What programming language would you recommend an aspiring data scientist to learn first?
Categorical
Imbalance 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| Python | |
|---|---|
| R | |
| SQL | 711 |
| C++ | 122 |
| MATLAB | 110 |
| Other values (7) | 368 |
Length
| Max length | 10 |
|---|---|
| Median length | 6 |
| Mean length | 5.186565 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Python |
|---|---|
| 2nd row | Python |
| 3rd row | Python |
| 4th row | Java |
| 5th row | Python |
Common Values
| Value | Count | Frequency (%) |
| Python | 7880 | |
| R | 1036 | 10.1% |
| SQL | 711 | 7.0% |
| C++ | 122 | 1.2% |
| MATLAB | 110 | 1.1% |
| Other | 102 | 1.0% |
| C | 80 | 0.8% |
| Java | 64 | 0.6% |
| None | 56 | 0.5% |
| Javascript | 34 | 0.3% |
| Other values (2) | 32 | 0.3% |
Length
| Value | Count | Frequency (%) |
| python | 7880 | |
| r | 1036 | 10.1% |
| sql | 711 | 7.0% |
| c | 202 | 2.0% |
| matlab | 110 | 1.1% |
| other | 102 | 1.0% |
| java | 64 | 0.6% |
| none | 56 | 0.5% |
| javascript | 34 | 0.3% |
| bash | 27 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 8021 | |
| h | 8009 | |
| o | 7936 | |
| n | 7936 | |
| y | 7885 | |
| P | 7880 | |
| R | 1036 | 2.0% |
| L | 821 | 1.5% |
| S | 716 | 1.3% |
| Q | 711 | 1.3% |
| Other values (17) | 2092 | 3.9% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 53043 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 8021 | |
| h | 8009 | |
| o | 7936 | |
| n | 7936 | |
| y | 7885 | |
| P | 7880 | |
| R | 1036 | 2.0% |
| L | 821 | 1.5% |
| S | 716 | 1.3% |
| Q | 711 | 1.3% |
| Other values (17) | 2092 | 3.9% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 53043 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 8021 | |
| h | 8009 | |
| o | 7936 | |
| n | 7936 | |
| y | 7885 | |
| P | 7880 | |
| R | 1036 | 2.0% |
| L | 821 | 1.5% |
| S | 716 | 1.3% |
| Q | 711 | 1.3% |
| Other values (17) | 2092 | 3.9% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 53043 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 8021 | |
| h | 8009 | |
| o | 7936 | |
| n | 7936 | |
| y | 7885 | |
| P | 7880 | |
| R | 1036 | 2.0% |
| L | 821 | 1.5% |
| S | 716 | 1.3% |
| Q | 711 | 1.3% |
| Other values (17) | 2092 | 3.9% |
Have you ever used a TPU (tensor processing unit)?
Categorical
Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| Never | |
|---|---|
| Once | |
| 2-5 times | 768 |
| 6-24 times | 134 |
| > 25 times | 117 |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.3296177 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Never |
|---|---|
| 2nd row | Once |
| 3rd row | Never |
| 4th row | Never |
| 5th row | 6-24 times |
Common Values
| Value | Count | Frequency (%) |
| Never | 8252 | |
| Once | 956 | 9.3% |
| 2-5 times | 768 | 7.5% |
| 6-24 times | 134 | 1.3% |
| > 25 times | 117 | 1.1% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| never | 8252 | |
| times | 1019 | 9.0% |
| once | 956 | 8.4% |
| 2-5 | 768 | 6.8% |
| 6-24 | 134 | 1.2% |
| 117 | 1.0% | |
| 25 | 117 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 18479 | |
| N | 8252 | |
| v | 8252 | |
| r | 8252 | |
| 1136 | 2.1% | |
| s | 1019 | 1.9% |
| 2 | 1019 | 1.9% |
| t | 1019 | 1.9% |
| i | 1019 | 1.9% |
| m | 1019 | 1.9% |
| Other values (8) | 5040 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 54506 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 18479 | |
| N | 8252 | |
| v | 8252 | |
| r | 8252 | |
| 1136 | 2.1% | |
| s | 1019 | 1.9% |
| 2 | 1019 | 1.9% |
| t | 1019 | 1.9% |
| i | 1019 | 1.9% |
| m | 1019 | 1.9% |
| Other values (8) | 5040 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 54506 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 18479 | |
| N | 8252 | |
| v | 8252 | |
| r | 8252 | |
| 1136 | 2.1% | |
| s | 1019 | 1.9% |
| 2 | 1019 | 1.9% |
| t | 1019 | 1.9% |
| i | 1019 | 1.9% |
| m | 1019 | 1.9% |
| Other values (8) | 5040 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 54506 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 18479 | |
| N | 8252 | |
| v | 8252 | |
| r | 8252 | |
| 1136 | 2.1% | |
| s | 1019 | 1.9% |
| 2 | 1019 | 1.9% |
| t | 1019 | 1.9% |
| i | 1019 | 1.9% |
| m | 1019 | 1.9% |
| Other values (8) | 5040 | 9.2% |
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| < 1 years | |
|---|---|
| 1-2 years | |
| 2-3 years | |
| 3-4 years | |
| 4-5 years | |
| Other values (3) |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 9.1413904 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | 2-3 years |
| 3rd row | < 1 years |
| 4th row | 10-15 years |
| 5th row | 2-3 years |
Common Values
| Value | Count | Frequency (%) |
| < 1 years | 2966 | |
| 1-2 years | 2641 | |
| 2-3 years | 1526 | |
| 3-4 years | 946 | 9.3% |
| 4-5 years | 850 | 8.3% |
| 5-10 years | 808 | 7.9% |
| 10-15 years | 319 | 3.1% |
| 20+ years | 171 | 1.7% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 10227 | |
| 2966 | 12.7% | |
| 1 | 2966 | 12.7% |
| 1-2 | 2641 | 11.3% |
| 2-3 | 1526 | 6.5% |
| 3-4 | 946 | 4.0% |
| 4-5 | 850 | 3.6% |
| 5-10 | 808 | 3.5% |
| 10-15 | 319 | 1.4% |
| 20 | 171 | 0.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 13193 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7090 | |
| 1 | 7053 | |
| 2 | 4338 | 4.6% |
| < | 2966 | 3.2% |
| Other values (5) | 7714 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 93489 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 13193 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7090 | |
| 1 | 7053 | |
| 2 | 4338 | 4.6% |
| < | 2966 | 3.2% |
| Other values (5) | 7714 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 93489 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 13193 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7090 | |
| 1 | 7053 | |
| 2 | 4338 | 4.6% |
| < | 2966 | 3.2% |
| Other values (5) | 7714 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 93489 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 13193 | ||
| y | 10227 | |
| e | 10227 | |
| a | 10227 | |
| r | 10227 | |
| s | 10227 | |
| - | 7090 | |
| 1 | 7053 | |
| 2 | 4338 | 4.6% |
| < | 2966 | 3.2% |
| Other values (5) | 7714 |
| Distinct | 902 |
|---|---|
| Distinct (%) | 8.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 505 |
|---|---|
| Median length | 395 |
| Mean length | 163.61435 |
| Min length | 4 |
Unique
| Unique | 304 ? |
|---|---|
| Unique (%) | 3.0% |
Sample
| 1st row | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
|---|---|
| 2nd row | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) |
| 3rd row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other |
| 4th row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
| 5th row | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc) |
| Value | Count | Frequency (%) |
| etc | 28447 | 13.9% |
| data | 10481 | 5.1% |
| science | 10481 | 5.1% |
| forums | 9222 | 4.5% |
| kaggle | 6783 | 3.3% |
| blog | 6783 | 3.3% |
| social | 6783 | 3.3% |
| media | 6783 | 3.3% |
| kdnuggets | 6550 | 3.2% |
| vidhya | 6550 | 3.2% |
| Other values (36) | 105472 |
Most occurring characters
| Value | Count | Frequency (%) |
| 194108 | 11.6% | |
| e | 124958 | 7.5% |
| a | 117653 | 7.0% |
| i | 99034 | 5.9% |
| t | 90903 | 5.4% |
| , | 90663 | 5.4% |
| s | 86036 | 5.1% |
| c | 84486 | 5.0% |
| o | 78256 | 4.7% |
| n | 68808 | 4.1% |
| Other values (39) | 638379 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1673284 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 194108 | 11.6% | |
| e | 124958 | 7.5% |
| a | 117653 | 7.0% |
| i | 99034 | 5.9% |
| t | 90903 | 5.4% |
| , | 90663 | 5.4% |
| s | 86036 | 5.1% |
| c | 84486 | 5.0% |
| o | 78256 | 4.7% |
| n | 68808 | 4.1% |
| Other values (39) | 638379 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1673284 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 194108 | 11.6% | |
| e | 124958 | 7.5% |
| a | 117653 | 7.0% |
| i | 99034 | 5.9% |
| t | 90903 | 5.4% |
| , | 90663 | 5.4% |
| s | 86036 | 5.1% |
| c | 84486 | 5.0% |
| o | 78256 | 4.7% |
| n | 68808 | 4.1% |
| Other values (39) | 638379 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1673284 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 194108 | 11.6% | |
| e | 124958 | 7.5% |
| a | 117653 | 7.0% |
| i | 99034 | 5.9% |
| t | 90903 | 5.4% |
| , | 90663 | 5.4% |
| s | 86036 | 5.1% |
| c | 84486 | 5.0% |
| o | 78256 | 4.7% |
| n | 68808 | 4.1% |
| Other values (39) | 638379 |
| Distinct | 713 |
|---|---|
| Distinct (%) | 7.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 169 |
|---|---|
| Median length | 148 |
| Mean length | 41.713601 |
| Min length | 3 |
Unique
| Unique | 244 ? |
|---|---|
| Unique (%) | 2.4% |
Sample
| 1st row | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy |
|---|---|
| 2nd row | Coursera, edX, DataCamp, University Courses (resulting in a university degree) |
| 3rd row | Other |
| 4th row | None |
| 5th row | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy |
| Value | Count | Frequency (%) |
| kaggle | 6470 | |
| courses | 5999 | |
| coursera | 5810 | 10.3% |
| university | 5528 | 9.8% |
| learn | 3235 | 5.7% |
| i.e | 3235 | 5.7% |
| udemy | 3115 | 5.5% |
| a | 2764 | 4.9% |
| degree | 2764 | 4.9% |
| resulting | 2764 | 4.9% |
| Other values (10) | 14685 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 50571 | 11.9% |
| 46142 | 10.8% | |
| r | 33821 | 7.9% |
| a | 32264 | 7.6% |
| s | 27657 | 6.5% |
| i | 24626 | 5.8% |
| g | 19304 | 4.5% |
| n | 18367 | 4.3% |
| u | 17805 | 4.2% |
| t | 16101 | 3.8% |
| Other values (25) | 139947 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 426605 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 50571 | 11.9% |
| 46142 | 10.8% | |
| r | 33821 | 7.9% |
| a | 32264 | 7.6% |
| s | 27657 | 6.5% |
| i | 24626 | 5.8% |
| g | 19304 | 4.5% |
| n | 18367 | 4.3% |
| u | 17805 | 4.2% |
| t | 16101 | 3.8% |
| Other values (25) | 139947 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 426605 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 50571 | 11.9% |
| 46142 | 10.8% | |
| r | 33821 | 7.9% |
| a | 32264 | 7.6% |
| s | 27657 | 6.5% |
| i | 24626 | 5.8% |
| g | 19304 | 4.5% |
| n | 18367 | 4.3% |
| u | 17805 | 4.2% |
| t | 16101 | 3.8% |
| Other values (25) | 139947 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 426605 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 50571 | 11.9% |
| 46142 | 10.8% | |
| r | 33821 | 7.9% |
| a | 32264 | 7.6% |
| s | 27657 | 6.5% |
| i | 24626 | 5.8% |
| g | 19304 | 4.5% |
| n | 18367 | 4.3% |
| u | 17805 | 4.2% |
| t | 16101 | 3.8% |
| Other values (25) | 139947 |
Which of the following integrated development environments (IDE's) do you use on a regular basis?
Text
| Distinct | 752 |
|---|---|
| Distinct (%) | 7.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 185 |
|---|---|
| Median length | 158 |
| Mean length | 65.056322 |
| Min length | 4 |
Unique
| Unique | 260 ? |
|---|---|
| Unique (%) | 2.5% |
Sample
| 1st row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder |
|---|---|
| 2nd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code |
| 3rd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) |
| 4th row | RStudio , Other |
| 5th row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text |
| Value | Count | Frequency (%) |
| 21927 | ||
| jupyter | 14972 | |
| notebooks | 7486 | 7.8% |
| etc | 7486 | 7.8% |
| jupyterlab | 7486 | 7.8% |
| visual | 6468 | 6.8% |
| studio | 6468 | 6.8% |
| rstudio | 3334 | 3.5% |
| code | 3234 | 3.4% |
| pycharm | 2999 | 3.1% |
| Other values (10) | 13680 |
Most occurring characters
| Value | Count | Frequency (%) |
| 129730 | ||
| t | 53060 | 8.0% |
| e | 49519 | 7.4% |
| u | 40541 | 6.1% |
| o | 39151 | 5.9% |
| , | 32273 | 4.9% |
| r | 28028 | 4.2% |
| y | 27509 | 4.1% |
| p | 27007 | 4.1% |
| J | 22458 | 3.4% |
| Other values (29) | 216055 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 665331 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 129730 | ||
| t | 53060 | 8.0% |
| e | 49519 | 7.4% |
| u | 40541 | 6.1% |
| o | 39151 | 5.9% |
| , | 32273 | 4.9% |
| r | 28028 | 4.2% |
| y | 27509 | 4.1% |
| p | 27007 | 4.1% |
| J | 22458 | 3.4% |
| Other values (29) | 216055 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 665331 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 129730 | ||
| t | 53060 | 8.0% |
| e | 49519 | 7.4% |
| u | 40541 | 6.1% |
| o | 39151 | 5.9% |
| , | 32273 | 4.9% |
| r | 28028 | 4.2% |
| y | 27509 | 4.1% |
| p | 27007 | 4.1% |
| J | 22458 | 3.4% |
| Other values (29) | 216055 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 665331 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 129730 | ||
| t | 53060 | 8.0% |
| e | 49519 | 7.4% |
| u | 40541 | 6.1% |
| o | 39151 | 5.9% |
| , | 32273 | 4.9% |
| r | 28028 | 4.2% |
| y | 27509 | 4.1% |
| p | 27007 | 4.1% |
| J | 22458 | 3.4% |
| Other values (29) | 216055 |
| Distinct | 228 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 295 |
|---|---|
| Median length | 254 |
| Mean length | 29.367948 |
| Min length | 4 |
Unique
| Unique | 96 ? |
|---|---|
| Unique (%) | 0.9% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Microsoft Azure Notebooks |
| 3rd row | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) |
| 4th row | None |
| 5th row | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub |
| Value | Count | Frequency (%) |
| 5411 | ||
| notebooks | 5161 | |
| none | 3870 | 9.1% |
| 3771 | 8.8% | |
| kernels | 3225 | 7.5% |
| kaggle | 3225 | 7.5% |
| colab | 2982 | 7.0% |
| notebook | 1427 | 3.3% |
| products | 1427 | 3.3% |
| etc | 1427 | 3.3% |
| Other values (20) | 10816 |
Most occurring characters
| Value | Count | Frequency (%) |
| 47961 | ||
| o | 39506 | |
| e | 30394 | 10.1% |
| l | 15641 | 5.2% |
| t | 14204 | 4.7% |
| b | 11585 | 3.9% |
| a | 11491 | 3.8% |
| s | 11037 | 3.7% |
| g | 10859 | 3.6% |
| N | 10458 | 3.5% |
| Other values (34) | 97210 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 300346 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 47961 | ||
| o | 39506 | |
| e | 30394 | 10.1% |
| l | 15641 | 5.2% |
| t | 14204 | 4.7% |
| b | 11585 | 3.9% |
| a | 11491 | 3.8% |
| s | 11037 | 3.7% |
| g | 10859 | 3.6% |
| N | 10458 | 3.5% |
| Other values (34) | 97210 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 300346 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 47961 | ||
| o | 39506 | |
| e | 30394 | 10.1% |
| l | 15641 | 5.2% |
| t | 14204 | 4.7% |
| b | 11585 | 3.9% |
| a | 11491 | 3.8% |
| s | 11037 | 3.7% |
| g | 10859 | 3.6% |
| N | 10458 | 3.5% |
| Other values (34) | 97210 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 300346 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 47961 | ||
| o | 39506 | |
| e | 30394 | 10.1% |
| l | 15641 | 5.2% |
| t | 14204 | 4.7% |
| b | 11585 | 3.9% |
| a | 11491 | 3.8% |
| s | 11037 | 3.7% |
| g | 10859 | 3.6% |
| N | 10458 | 3.5% |
| Other values (34) | 97210 |
| Distinct | 542 |
|---|---|
| Distinct (%) | 5.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 70 |
|---|---|
| Median length | 58 |
| Mean length | 15.099736 |
| Min length | 1 |
Unique
| Unique | 181 ? |
|---|---|
| Unique (%) | 1.8% |
Sample
| 1st row | Python, R, SQL, Java, Javascript, MATLAB |
|---|---|
| 2nd row | Python, R, SQL, Bash |
| 3rd row | Python, SQL |
| 4th row | Python, R |
| 5th row | Python, R, Bash |
| Value | Count | Frequency (%) |
| python | 9016 | |
| sql | 5218 | |
| r | 3514 | 13.0% |
| c | 2185 | 8.1% |
| bash | 1685 | 6.2% |
| javascript | 1617 | 6.0% |
| java | 1501 | 5.6% |
| other | 966 | 3.6% |
| matlab | 922 | 3.4% |
| typescript | 331 | 1.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| , | 16788 | 10.9% |
| 16788 | 10.9% | |
| t | 11930 | 7.7% |
| h | 11667 | 7.6% |
| y | 9347 | 6.1% |
| o | 9076 | 5.9% |
| n | 9076 | 5.9% |
| P | 9016 | 5.8% |
| a | 7921 | 5.1% |
| L | 6140 | 4.0% |
| Other values (19) | 46676 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 154425 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| , | 16788 | 10.9% |
| 16788 | 10.9% | |
| t | 11930 | 7.7% |
| h | 11667 | 7.6% |
| y | 9347 | 6.1% |
| o | 9076 | 5.9% |
| n | 9076 | 5.9% |
| P | 9016 | 5.8% |
| a | 7921 | 5.1% |
| L | 6140 | 4.0% |
| Other values (19) | 46676 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 154425 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| , | 16788 | 10.9% |
| 16788 | 10.9% | |
| t | 11930 | 7.7% |
| h | 11667 | 7.6% |
| y | 9347 | 6.1% |
| o | 9076 | 5.9% |
| n | 9076 | 5.9% |
| P | 9016 | 5.8% |
| a | 7921 | 5.1% |
| L | 6140 | 4.0% |
| Other values (19) | 46676 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 154425 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| , | 16788 | 10.9% |
| 16788 | 10.9% | |
| t | 11930 | 7.7% |
| h | 11667 | 7.6% |
| y | 9347 | 6.1% |
| o | 9076 | 5.9% |
| n | 9076 | 5.9% |
| P | 9016 | 5.8% |
| a | 7921 | 5.1% |
| L | 6140 | 4.0% |
| Other values (19) | 46676 |
| Distinct | 412 |
|---|---|
| Distinct (%) | 4.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 141 |
|---|---|
| Median length | 123 |
| Mean length | 30.944656 |
| Min length | 4 |
Unique
| Unique | 158 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | Matplotlib |
|---|---|
| 2nd row | Ggplot / ggplot2 , Matplotlib , Seaborn |
| 3rd row | Matplotlib , Plotly / Plotly Express , Seaborn |
| 4th row | Ggplot / ggplot2 |
| 5th row | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn |
| Value | Count | Frequency (%) |
| 18890 | ||
| matplotlib | 7288 | 14.4% |
| plotly | 4994 | 9.9% |
| seaborn | 4862 | 9.6% |
| ggplot | 3177 | 6.3% |
| ggplot2 | 3177 | 6.3% |
| express | 2497 | 4.9% |
| shiny | 1079 | 2.1% |
| none | 915 | 1.8% |
| d3.js | 903 | 1.8% |
| Other values (6) | 2722 | 5.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| 70917 | ||
| l | 32894 | 10.4% |
| t | 27349 | 8.6% |
| o | 26638 | 8.4% |
| p | 16603 | 5.2% |
| , | 12758 | 4.0% |
| a | 12740 | 4.0% |
| b | 12614 | 4.0% |
| e | 10864 | 3.4% |
| g | 9531 | 3.0% |
| Other values (28) | 83563 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 316471 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 70917 | ||
| l | 32894 | 10.4% |
| t | 27349 | 8.6% |
| o | 26638 | 8.4% |
| p | 16603 | 5.2% |
| , | 12758 | 4.0% |
| a | 12740 | 4.0% |
| b | 12614 | 4.0% |
| e | 10864 | 3.4% |
| g | 9531 | 3.0% |
| Other values (28) | 83563 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 316471 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 70917 | ||
| l | 32894 | 10.4% |
| t | 27349 | 8.6% |
| o | 26638 | 8.4% |
| p | 16603 | 5.2% |
| , | 12758 | 4.0% |
| a | 12740 | 4.0% |
| b | 12614 | 4.0% |
| e | 10864 | 3.4% |
| g | 9531 | 3.0% |
| Other values (28) | 83563 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 316471 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 70917 | ||
| l | 32894 | 10.4% |
| t | 27349 | 8.6% |
| o | 26638 | 8.4% |
| p | 16603 | 5.2% |
| , | 12758 | 4.0% |
| a | 12740 | 4.0% |
| b | 12614 | 4.0% |
| e | 10864 | 3.4% |
| g | 9531 | 3.0% |
| Other values (28) | 83563 |
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| CPUs | |
|---|---|
| CPUs, GPUs | |
| None / I do not know | |
| GPUs | |
| CPUs, GPUs, TPUs | 248 |
| Other values (9) | 178 |
Length
| Max length | 23 |
|---|---|
| Median length | 20 |
| Mean length | 9.1786448 |
| Min length | 4 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | CPUs, GPUs |
|---|---|
| 2nd row | CPUs, GPUs |
| 3rd row | CPUs, GPUs |
| 4th row | CPUs, GPUs |
| 5th row | CPUs, GPUs |
Common Values
| Value | Count | Frequency (%) |
| CPUs | 3687 | |
| CPUs, GPUs | 3680 | |
| None / I do not know | 1683 | |
| GPUs | 751 | 7.3% |
| CPUs, GPUs, TPUs | 248 | 2.4% |
| GPUs, TPUs | 50 | 0.5% |
| Other | 40 | 0.4% |
| CPUs, TPUs | 23 | 0.2% |
| CPUs, GPUs, Other | 21 | 0.2% |
| TPUs | 21 | 0.2% |
| Other values (4) | 23 | 0.2% |
Length
| Value | Count | Frequency (%) |
| cpus | 7677 | |
| gpus | 4760 | |
| none | 1683 | 7.3% |
| 1683 | 7.3% | |
| i | 1683 | 7.3% |
| do | 1683 | 7.3% |
| not | 1683 | 7.3% |
| know | 1683 | 7.3% |
| tpus | 348 | 1.5% |
| other | 84 | 0.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| P | 12785 | |
| U | 12785 | |
| s | 12785 | |
| 12740 | ||
| C | 7677 | |
| o | 6732 | |
| n | 5049 | 5.4% |
| G | 4760 | 5.1% |
| , | 4325 | 4.6% |
| t | 1767 | 1.9% |
| Other values (11) | 12465 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 93870 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| P | 12785 | |
| U | 12785 | |
| s | 12785 | |
| 12740 | ||
| C | 7677 | |
| o | 6732 | |
| n | 5049 | 5.4% |
| G | 4760 | 5.1% |
| , | 4325 | 4.6% |
| t | 1767 | 1.9% |
| Other values (11) | 12465 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 93870 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| P | 12785 | |
| U | 12785 | |
| s | 12785 | |
| 12740 | ||
| C | 7677 | |
| o | 6732 | |
| n | 5049 | 5.4% |
| G | 4760 | 5.1% |
| , | 4325 | 4.6% |
| t | 1767 | 1.9% |
| Other values (11) | 12465 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 93870 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| P | 12785 | |
| U | 12785 | |
| s | 12785 | |
| 12740 | ||
| C | 7677 | |
| o | 6732 | |
| n | 5049 | 5.4% |
| G | 4760 | 5.1% |
| , | 4325 | 4.6% |
| t | 1767 | 1.9% |
| Other values (11) | 12465 |
| Distinct | 630 |
|---|---|
| Distinct (%) | 6.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 336 |
|---|---|
| Median length | 288 |
| Mean length | 103.95815 |
| Min length | 4 |
Unique
| Unique | 230 ? |
|---|---|
| Unique (%) | 2.2% |
Sample
| 1st row | Linear or Logistic Regression |
|---|---|
| 2nd row | Linear or Logistic Regression, Convolutional Neural Networks |
| 3rd row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) |
| 4th row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks |
| 5th row | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks |
| Value | Count | Frequency (%) |
| or | 13843 | 10.4% |
| networks | 10147 | 7.7% |
| neural | 8736 | 6.6% |
| linear | 7454 | 5.6% |
| logistic | 7454 | 5.6% |
| regression | 7454 | 5.6% |
| etc | 7434 | 5.6% |
| decision | 6389 | 4.8% |
| trees | 6389 | 4.8% |
| random | 6389 | 4.8% |
| Other values (20) | 50827 |
Most occurring characters
| Value | Count | Frequency (%) |
| 122289 | 11.5% | |
| e | 103627 | 9.7% |
| o | 93274 | 8.8% |
| s | 83531 | 7.9% |
| r | 78498 | 7.4% |
| i | 68652 | 6.5% |
| n | 58973 | 5.5% |
| t | 57547 | 5.4% |
| a | 47650 | 4.5% |
| , | 35206 | 3.3% |
| Other values (33) | 313933 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1063180 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 122289 | 11.5% | |
| e | 103627 | 9.7% |
| o | 93274 | 8.8% |
| s | 83531 | 7.9% |
| r | 78498 | 7.4% |
| i | 68652 | 6.5% |
| n | 58973 | 5.5% |
| t | 57547 | 5.4% |
| a | 47650 | 4.5% |
| , | 35206 | 3.3% |
| Other values (33) | 313933 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1063180 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 122289 | 11.5% | |
| e | 103627 | 9.7% |
| o | 93274 | 8.8% |
| s | 83531 | 7.9% |
| r | 78498 | 7.4% |
| i | 68652 | 6.5% |
| n | 58973 | 5.5% |
| t | 57547 | 5.4% |
| a | 47650 | 4.5% |
| , | 35206 | 3.3% |
| Other values (33) | 313933 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1063180 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 122289 | 11.5% | |
| e | 103627 | 9.7% |
| o | 93274 | 8.8% |
| s | 83531 | 7.9% |
| r | 78498 | 7.4% |
| i | 68652 | 6.5% |
| n | 58973 | 5.5% |
| t | 57547 | 5.4% |
| a | 47650 | 4.5% |
| , | 35206 | 3.3% |
| Other values (33) | 313933 |
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 374 |
|---|---|
| Median length | 4 |
| Mean length | 46.460057 |
| Min length | 4 |
Unique
| Unique | 17 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 3rd row | None |
| 4th row | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 5th row | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| Value | Count | Frequency (%) |
| e.g | 7591 | 13.4% |
| automated | 6645 | 11.8% |
| none | 5708 | 10.1% |
| model | 2625 | 4.6% |
| selection | 2257 | 4.0% |
| auto-sklearn | 2257 | 4.0% |
| xcessiv | 2257 | 4.0% |
| tuning | 1463 | 2.6% |
| ray.tune | 1463 | 2.6% |
| hyperopt | 1463 | 2.6% |
| Other values (24) | 22805 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 56883 | 12.0% |
| 46307 | 9.7% | |
| t | 40565 | 8.5% |
| o | 32966 | 6.9% |
| a | 29760 | 6.3% |
| n | 27099 | 5.7% |
| u | 21484 | 4.5% |
| i | 17800 | 3.7% |
| r | 16781 | 3.5% |
| . | 16645 | 3.5% |
| Other values (31) | 168857 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 475147 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 56883 | 12.0% |
| 46307 | 9.7% | |
| t | 40565 | 8.5% |
| o | 32966 | 6.9% |
| a | 29760 | 6.3% |
| n | 27099 | 5.7% |
| u | 21484 | 4.5% |
| i | 17800 | 3.7% |
| r | 16781 | 3.5% |
| . | 16645 | 3.5% |
| Other values (31) | 168857 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 475147 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 56883 | 12.0% |
| 46307 | 9.7% | |
| t | 40565 | 8.5% |
| o | 32966 | 6.9% |
| a | 29760 | 6.3% |
| n | 27099 | 5.7% |
| u | 21484 | 4.5% |
| i | 17800 | 3.7% |
| r | 16781 | 3.5% |
| . | 16645 | 3.5% |
| Other values (31) | 168857 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 475147 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 56883 | 12.0% |
| 46307 | 9.7% | |
| t | 40565 | 8.5% |
| o | 32966 | 6.9% |
| a | 29760 | 6.3% |
| n | 27099 | 5.7% |
| u | 21484 | 4.5% |
| i | 17800 | 3.7% |
| r | 16781 | 3.5% |
| . | 16645 | 3.5% |
| Other values (31) | 168857 |
| Distinct | 554 |
|---|---|
| Distinct (%) | 5.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
Length
| Max length | 129 |
|---|---|
| Median length | 107 |
| Mean length | 36.541606 |
| Min length | 4 |
Unique
| Unique | 194 ? |
|---|---|
| Unique (%) | 1.9% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Scikit-learn , TensorFlow , Keras , RandomForest |
| 3rd row | Scikit-learn , RandomForest, Xgboost , LightGBM |
| 4th row | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret |
| 5th row | Scikit-learn , TensorFlow , Keras , PyTorch |
| Value | Count | Frequency (%) |
| 17585 | ||
| scikit-learn | 6883 | 14.1% |
| keras | 4265 | 8.7% |
| tensorflow | 4233 | 8.6% |
| randomforest | 3457 | 7.1% |
| xgboost | 3367 | 6.9% |
| pytorch | 2517 | 5.1% |
| lightgbm | 1734 | 3.5% |
| none | 1302 | 2.7% |
| caret | 984 | 2.0% |
| Other values (4) | 2617 | 5.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 86729 | ||
| o | 25933 | 6.9% |
| r | 23427 | 6.3% |
| e | 21408 | 5.7% |
| , | 20328 | 5.4% |
| a | 17843 | 4.8% |
| t | 17434 | 4.7% |
| i | 17029 | 4.6% |
| s | 16047 | 4.3% |
| n | 15875 | 4.2% |
| Other values (27) | 111658 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 373711 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 86729 | ||
| o | 25933 | 6.9% |
| r | 23427 | 6.3% |
| e | 21408 | 5.7% |
| , | 20328 | 5.4% |
| a | 17843 | 4.8% |
| t | 17434 | 4.7% |
| i | 17029 | 4.6% |
| s | 16047 | 4.3% |
| n | 15875 | 4.2% |
| Other values (27) | 111658 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 373711 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 86729 | ||
| o | 25933 | 6.9% |
| r | 23427 | 6.3% |
| e | 21408 | 5.7% |
| , | 20328 | 5.4% |
| a | 17843 | 4.8% |
| t | 17434 | 4.7% |
| i | 17029 | 4.6% |
| s | 16047 | 4.3% |
| n | 15875 | 4.2% |
| Other values (27) | 111658 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 373711 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 86729 | ||
| o | 25933 | 6.9% |
| r | 23427 | 6.3% |
| e | 21408 | 5.7% |
| , | 20328 | 5.4% |
| a | 17843 | 4.8% |
| t | 17434 | 4.7% |
| i | 17029 | 4.6% |
| s | 16047 | 4.3% |
| n | 15875 | 4.2% |
| Other values (27) | 111658 |
Correlations
| Approximately how many individuals are responsible for data science workloads at your place of business? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | Does your current employer incorporate machine learning methods into their business? | For how many years have you used machine learning methods? | Have you ever used a TPU (tensor processing unit)? | How long have you been writing code to analyze data (at work or at school)? | Select the title most similar to your current role (or most recent title if retired) | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | What is the size of the company where you are employed? | What is your age (# years)? | What is your current yearly compensation (approximate $USD)? | What is your gender? | What programming language would you recommend an aspiring data scientist to learn first? | Which types of specialized hardware do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Approximately how many individuals are responsible for data science workloads at your place of business? | 1.000 | 0.150 | 0.244 | 0.112 | 0.034 | 0.123 | 0.112 | 0.054 | 0.301 | 0.033 | 0.104 | 0.023 | 0.024 | 0.036 |
| Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | 0.150 | 1.000 | 0.167 | 0.157 | 0.073 | 0.144 | 0.087 | 0.045 | 0.102 | 0.096 | 0.196 | 0.048 | 0.037 | 0.088 |
| Does your current employer incorporate machine learning methods into their business? | 0.244 | 0.167 | 1.000 | 0.202 | 0.051 | 0.143 | 0.162 | 0.062 | 0.121 | 0.050 | 0.137 | 0.032 | 0.041 | 0.098 |
| For how many years have you used machine learning methods? | 0.112 | 0.157 | 0.202 | 1.000 | 0.086 | 0.463 | 0.154 | 0.147 | 0.036 | 0.160 | 0.165 | 0.065 | 0.050 | 0.112 |
| Have you ever used a TPU (tensor processing unit)? | 0.034 | 0.073 | 0.051 | 0.086 | 1.000 | 0.050 | 0.045 | 0.018 | 0.023 | 0.043 | 0.046 | 0.042 | 0.035 | 0.280 |
| How long have you been writing code to analyze data (at work or at school)? | 0.123 | 0.144 | 0.143 | 0.463 | 0.050 | 1.000 | 0.145 | 0.149 | 0.058 | 0.282 | 0.228 | 0.050 | 0.061 | 0.077 |
| Select the title most similar to your current role (or most recent title if retired) | 0.112 | 0.087 | 0.162 | 0.154 | 0.045 | 0.145 | 1.000 | 0.180 | 0.053 | 0.084 | 0.068 | 0.086 | 0.080 | 0.085 |
| What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | 0.054 | 0.045 | 0.062 | 0.147 | 0.018 | 0.149 | 0.180 | 1.000 | 0.060 | 0.152 | 0.086 | 0.052 | 0.051 | 0.037 |
| What is the size of the company where you are employed? | 0.301 | 0.102 | 0.121 | 0.036 | 0.023 | 0.058 | 0.053 | 0.060 | 1.000 | 0.071 | 0.137 | 0.030 | 0.040 | 0.040 |
| What is your age (# years)? | 0.033 | 0.096 | 0.050 | 0.160 | 0.043 | 0.282 | 0.084 | 0.152 | 0.071 | 1.000 | 0.148 | 0.064 | 0.056 | 0.034 |
| What is your current yearly compensation (approximate $USD)? | 0.104 | 0.196 | 0.137 | 0.165 | 0.046 | 0.228 | 0.068 | 0.086 | 0.137 | 0.148 | 1.000 | 0.077 | 0.038 | 0.044 |
| What is your gender? | 0.023 | 0.048 | 0.032 | 0.065 | 0.042 | 0.050 | 0.086 | 0.052 | 0.030 | 0.064 | 0.077 | 1.000 | 0.040 | 0.163 |
| What programming language would you recommend an aspiring data scientist to learn first? | 0.024 | 0.037 | 0.041 | 0.050 | 0.035 | 0.061 | 0.080 | 0.051 | 0.040 | 0.056 | 0.038 | 0.040 | 1.000 | 0.070 |
| Which types of specialized hardware do you use on a regular basis? | 0.036 | 0.088 | 0.098 | 0.112 | 0.280 | 0.077 | 0.085 | 0.037 | 0.040 | 0.034 | 0.044 | 0.163 | 0.070 | 1.000 |
Missing values
Sample
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22-24 | Male | France | Master’s degree | Software Engineer | 1000-9,999 employees | 0 | I do not know | 30,000-39,999 | $0 (USD) | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 | 1-2 years | Python | Never | 1-2 years | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder | None | Python, R, SQL, Java, Javascript, MATLAB | Matplotlib | CPUs, GPUs | Linear or Logistic Regression | None | None |
| 3 | 40-44 | Male | Australia | Master’s degree | Other | > 10,000 employees | 20+ | I do not know | 250,000-299,999 | $10,000-$99,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 | 1-2 years | Python | Once | 2-3 years | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) | Coursera, edX, DataCamp, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | Microsoft Azure Notebooks | Python, R, SQL, Bash | Ggplot / ggplot2 , Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Convolutional Neural Networks | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | Scikit-learn , TensorFlow , Keras , RandomForest |
| 4 | 22-24 | Male | India | Bachelor’s degree | Other | 0-49 employees | 0 | No (we do not use ML methods) | 4,000-4,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 | < 1 years | Python | Never | < 1 years | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other | Other | Jupyter (JupyterLab, Jupyter Notebooks, etc) | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) | Python, SQL | Matplotlib , Plotly / Plotly Express , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) | None | Scikit-learn , RandomForest, Xgboost , LightGBM |
| 5 | 50-54 | Male | France | Master’s degree | Data Scientist | 0-49 employees | 3-4 | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $10,000-$99,999 | Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1 | 20+ years | Java | Never | 10-15 years | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | None | RStudio , Other | None | Python, R | Ggplot / ggplot2 | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret |
| 6 | 22-24 | Male | India | Master’s degree | Data Scientist | 50-249 employees | 20+ | We are exploring ML methods (and may one day put a model into production) | 10,000-14,999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1 | 3-5 years | Python | 6-24 times | 2-3 years | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc) | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub | Python, R, Bash | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | Scikit-learn , TensorFlow , Keras , PyTorch |
| 7 | 22-24 | Female | United States of America | Bachelor’s degree | Data Scientist | > 10,000 employees | 20+ | We recently started using ML methods (i.e., models in production for less than 2 years) | 80,000-89,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 3, -1 | 3-5 years | Python | Once | 3-4 years | Hacker News (https://news.ycombinator.com/), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Udemy, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder | Microsoft Azure Notebooks , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc) | Python | Matplotlib , Plotly / Plotly Express | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural Networks | None | Scikit-learn , TensorFlow , Keras , Spark MLib |
| 9 | 55-59 | Male | Netherlands | Master’s degree | Other | 0-49 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | $0-999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 5, -1 | 5-10 years | Python | Never | < 1 years | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera | Jupyter (JupyterLab, Jupyter Notebooks, etc) | None | Python, SQL | Matplotlib , D3.js , Seaborn | CPUs | Linear or Logistic Regression, Bayesian Approaches, Generative Adversarial Networks | None | Scikit-learn , PyTorch |
| 11 | 30-34 | Male | Germany | Master’s degree | Statistician | 0-49 employees | 5-9 | We recently started using ML methods (i.e., models in production for less than 2 years) | 2,000-2,999 | $1000-$9,999 | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 2, -1, -1, -1, -1 | 5-10 years | R | 2-5 times | 4-5 years | Podcasts (Chai Time Data Science, Linear Digressions, etc) | Coursera | Jupyter (JupyterLab, Jupyter Notebooks, etc) | Code Ocean | R | Matplotlib | CPUs | Bayesian Approaches | Automated data augmentation (e.g. imgaug, albumentations) | Scikit-learn |
| 12 | 30-34 | Male | Germany | Bachelor’s degree | Data Scientist | 50-249 employees | 5-9 | We recently started using ML methods (i.e., models in production for less than 2 years) | 70,000-79,999 | $1000-$9,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 6, -1 | 5-10 years | R | Never | 4-5 years | None | edX | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio | None | Python, R | Ggplot / ggplot2 | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Dense Neural Networks (MLPs, etc) | None | Keras , Caret |
| 13 | 30-34 | Male | United States of America | Master’s degree | Product/Project Manager | > 10,000 employees | 20+ | I do not know | 90,000-99,999 | $0 (USD) | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 1, -1, -1, -1, -1 | 3-5 years | Python | Never | 2-3 years | Hacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Udacity, Coursera, DataQuest, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , Atom , Notepad++ , Sublime Text | Kaggle Notebooks (Kernels) , Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) , Code Ocean | Python | Matplotlib , Plotly / Plotly Express , Seaborn | None / I do not know | Linear or Logistic Regression, Decision Trees or Random Forests, Bayesian Approaches | None | Scikit-learn , RandomForest |
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19299 | 50-54 | Male | France | Some college/university study without earning a bachelor’s degree | Data Scientist | 0-49 employees | 3-4 | We use ML methods for generating insights (but do not put working models into production) | 100,000-124,999 | $10,000-$99,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 125, -1 | 5-10 years | Python | 6-24 times | 4-5 years | Twitter (data science influencers), Hacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), Other | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , Visual Studio / Visual Studio Code | Kaggle Notebooks (Kernels) , Google Colab , Microsoft Azure Notebooks , Binder / JupyterHub | Python, SQL, C++ | Matplotlib , Shiny , Plotly / Plotly Express | TPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Evolutionary Approaches, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks, Transformer Networks (BERT, gpt-2, etc) | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv) | Scikit-learn , TensorFlow , PyTorch , Spark MLib |
| 19324 | 25-29 | Male | Nigeria | Doctoral degree | Data Scientist | 250-999 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | 1,000-1,999 | $100-$999 | Business intelligence software (Salesforce, Tableau, Spotfire, etc.), -1, -1, 337, -1, -1 | < 1 years | Python | Once | < 1 years | Reddit (r/machinelearning, r/datascience, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Udacity, edX, Udemy | Visual Studio / Visual Studio Code | Microsoft Azure Notebooks | Python, R, SQL | Ggplot / ggplot2 | GPUs | Bayesian Approaches | Automated data augmentation (e.g. imgaug, albumentations), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | Fast.ai |
| 19338 | 18-21 | Male | Nigeria | Bachelor’s degree | Data Analyst | 250-999 employees | 5-9 | I do not know | 5,000-7,499 | $1000-$9,999 | Advanced statistical software (SPSS, SAS, etc.), -1, 253, -1, -1, -1 | < 1 years | R | Never | < 1 years | Hacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc) | DataCamp | RStudio | None | Python, R | Ggplot / ggplot2 | CPUs | Linear or Logistic Regression | Automated feature engineering/selection (e.g. tpot, boruta_py) | None |
| 19425 | 35-39 | Male | Saudi Arabia | Master’s degree | Data Scientist | > 10,000 employees | 5-9 | We have well established ML methods (i.e., models in production for more than 2 years) | 100,000-124,999 | $10,000-$99,999 | Other, -1, -1, -1, -1, -1 | 5-10 years | R | Never | 10-15 years | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Slack Communities (ods.ai, kagglenoobs, etc) | Coursera, DataCamp | RStudio | None | R, SQL | Ggplot / ggplot2 , Shiny , Plotly / Plotly Express , Leaflet / Folium | CPUs | Gradient Boosting Machines (xgboost, lightgbm, etc) | None | Xgboost , Caret |
| 19442 | 25-29 | Male | Viet Nam | Master’s degree | Data Analyst | 50-249 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | $0-999 | $1-$99 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 50, -1 | 1-2 years | Python | Never | 1-2 years | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Udacity, Coursera, Kaggle Courses (i.e. Kaggle Learn), Fast.ai, Udemy, LinkedIn Learning | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Sublime Text | Kaggle Notebooks (Kernels) , Google Colab | Python | Matplotlib , Seaborn | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune) | Scikit-learn , Xgboost |
| 19443 | 25-29 | Male | India | Master’s degree | Data Scientist | 0-49 employees | 1-2 | We recently started using ML methods (i.e., models in production for less than 2 years) | 1,000-1,999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2838, -1 | 3-5 years | Python | Never | 2-3 years | Hacker News (https://news.ycombinator.com/), Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Slack Communities (ods.ai, kagglenoobs, etc) | Kaggle Courses (i.e. Kaggle Learn), LinkedIn Learning | Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , MATLAB , Notepad++ | Google Cloud Notebook Products (AI Platform, Datalab, etc) , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc) | Python, MATLAB | Matplotlib | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural Networks | Automated data augmentation (e.g. imgaug, albumentations) | Scikit-learn , TensorFlow , PyTorch , Spark MLib |
| 19582 | 22-24 | Female | Other | Bachelor’s degree | Other | 50-249 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | 5,000-7,499 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 | 1-2 years | Python | Never | 1-2 years | Other | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree), Other | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Atom , Visual Studio / Visual Studio Code , Spyder | Google Colab | Python | Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks | Automated hyperparameter tuning (e.g. hyperopt, ray.tune) | Scikit-learn , TensorFlow , PyTorch |
| 19663 | 25-29 | Male | China | I prefer not to answer | Data Engineer | 250-999 employees | 5-9 | We recently started using ML methods (i.e., models in production for less than 2 years) | 20,000-24,999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 12, -1 | 1-2 years | Python | Once | 1-2 years | Other | Kaggle Courses (i.e. Kaggle Learn) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm | Google Colab | Python | Seaborn | GPUs | Dense Neural Networks (MLPs, etc), Recurrent Neural Networks | None | Scikit-learn , TensorFlow , Keras |
| 19690 | 25-29 | Male | Australia | Bachelor’s degree | Other | 1000-9,999 employees | 5-9 | No (we do not use ML methods) | 60,000-69,999 | $10,000-$99,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 14, -1 | 3-5 years | Python | Never | 1-2 years | Hacker News (https://news.ycombinator.com/), Reddit (r/machinelearning, r/datascience, etc), Kaggle (forums, blog, social media, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, edX, Fast.ai, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , MATLAB , Visual Studio / Visual Studio Code | None | Python, SQL, MATLAB | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches | None | Scikit-learn , TensorFlow , PyTorch |
| 19716 | 50-54 | Male | France | Bachelor’s degree | Software Engineer | > 10,000 employees | 20+ | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 25, -1 | 3-5 years | Python | Never | 4-5 years | Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, edX, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | IBM Watson Studio | Python, SQL, Java, Bash | Matplotlib | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune) | Scikit-learn , Spark MLib |
Duplicate rows
Most frequently occurring
| What is your age (# years)? | What is your gender? | In which country do you currently reside? | What is the highest level of formal education that you have attained or plan to attain within the next 2 years? | Select the title most similar to your current role (or most recent title if retired) | What is the size of the company where you are employed? | Approximately how many individuals are responsible for data science workloads at your place of business? | Does your current employer incorporate machine learning methods into their business? | What is your current yearly compensation (approximate $USD)? | Approximately how much money have you spent on machine learning and/or cloud computing products at your work in the past 5 years? | What is the primary tool that you use at work or school to analyze data? | How long have you been writing code to analyze data (at work or at school)? | What programming language would you recommend an aspiring data scientist to learn first? | Have you ever used a TPU (tensor processing unit)? | For how many years have you used machine learning methods? | Who/what are your favorite media sources that report on data science topics? | On which platforms have you begun or completed data science courses? | Which of the following integrated development environments (IDE's) do you use on a regular basis? | Which of the following hosted notebook products do you use on a regular basis? | What programming languages do you use on a regular basis? | What data visualization libraries or tools do you use on a regular basis? | Which types of specialized hardware do you use on a regular basis? | Which of the following ML algorithms do you use on a regular basis? | Which categories of ML tools do you use on a regular basis? | Which of the following machine learning frameworks do you use on a regular basis? | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 30-34 | Male | Netherlands | Doctoral degree | Research Scientist | 1000-9,999 employees | 20+ | I do not know | 50,000-59,999 | $0 (USD) | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 1, -1, -1, -1, -1 | 5-10 years | Python | Never | < 1 years | Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, Fast.ai | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Atom , MATLAB | Google Colab | Python, MATLAB | Matplotlib | CPUs, GPUs | Convolutional Neural Networks | None | Fast.ai | 2 |